August, 2023

Zoom Electric: Sales Analysis

Problem Solving Growth Analyst Business Sense SQL Excel

I worked on the Zoom Electric- Sales Analysis project. I will be diving into the background, my full process of extracting and analyzing the data, along with my final summary and insights of the data.

Tools Used: MySQL | Microsoft Excel
Quick Links: SQL Query | GitHub Repository
Below is a table of contents in case you want to go to any section.

Table Contents:

  1. Background
  2. Data Dictionary
  3. Process
    1. Microsoft Excel
    2. MySQL
    3. Misc.
  4. Finished Project
  5. Overview of Data
  6. Insights
  7. What I Learned
  8. Resources

BACKGROUND

Zoom Electric (ZE) is an electric vehicle venture owned by a renowned automotive manufacturer called ZOOM MOTORS India Private Limited.
Zoom Electric offers a range of automotive and scooter products, including FioNex, Sprint, and Parker, along with some exclusive limited-edition variants.
Recently, ZE introduced a new scooter model called Sprint in the 2-wheeler EV scooter segment. Initially, the sales numbers for Sprint Scooters were quite impressive during the first two weeks of its launch. However, unexpectedly, there was a sudden decline in sales thereafter.
My Role: In this scenario I am working as an analyst in the sales & operation team at Zoom Electric (ZE). My team is responsible for executing the pre-orders, sales client visits, and website sales, and analyze the visit session created on the website, and analyzing the sales funnel. And my team has been tasked with the overall goal (see below) of sales & operation strategies.
Overall Goal: As a data analyst, I have been assigned to validate the sales drop for Sprint scooters and figure out the reasons for this decline.
Approach: To find the reason for decline in sales precisely, the overall goal is divided into three sub-goals:

  • Quantify the drop in sales volume, to validate if there was a drop in sales volume.
  • Assess whether there is a relation between the launch date and sales growth.
  • Assess the effectiveness of email campaign in influencing customers’ purchasing decisions.
  • DATA DICTIONARY

    Entity Relationship Diagram

    schema diagram

    1. Sales

    A table to show details regarding sales.

    Table Schema
    ```markdown |Column Name |Data Type|Description | |----------------------|---------|-----------------------------------------------------------------------------| |customer_id |INTEGER |Each and every customer logged in on Zoom Electric are assigned a uniqe ID | |product_id |INTEGER |An ID of product (can be duplicates) | |sales_transaction_date|TIMESTAMP|Date and time of sales transaction done for product_id | |channel |VARCHAR |Sales channel (offline or online) offline - dealers across the various cities| |dealership_id |INTEGER |Code of dealer through which sales done |
    Sample Data
    ```markdown |customer_id|product_id|sales_transaction_date|channel |dealership_id| |-----------|----------|----------------------|----------|-------------| |1 |7 |2017-07-19 08:38:00 |internet | | |22 |7 |2017-08-14 09:59:00 |dealership|20 | |145 |7 |2019-01-20 10:40:00 |internet | | |289 |7 |2017-05-09 14:20:00 |dealership|7 | |331 |7 |2019-05-21 20:03:00 |dealership|4 |

    2. Products

    A table to show details of the products.

    Table Schema
    ```markdown |Column Name |Data Type|Description | |---------------------|---------|---------------------------------------------------------------------------------| |product_id |INTEGER |Primary Key for the product table (unique identifiers for Zoom Electric products)| |model |VARCHAR |Model for the products | |year |INTEGER |Model Year for the products | |product_type |VARCHAR |Type of products- scooter, automobile | |base_price |FLOAT |price of the products | |production_start_date|TIMESTAMP|Date on which production of the product starts in Zoom Electric | |production_end_date |TIMESTAMP|Date on which production of the product end or closed in Zoom Electric |
    Sample Data
    ```markdown |product_id|model |year|product_type|base_price|production_start_date|production_end_date| |----------|----------------------|----|------------|----------|---------------------|-------------------| |5 |Parker |2014 |scooter |111998 |23-06-2014 00:00 |27-01-2015 00:00 | |6 |Corpel |2015 |automobile |524000 |15-04-2015 00:00 |01-10-2018 00:00 | |7 |Sprint |2016 |scooter |95998 |10-10-2016 00:00 | | |8 |Sprint Limited Edition|2017 |scooter |111998 |15-02-2017 00:00 | | |9 |SigniaSafari |2017 |automobile |280000 |15-02-2017 00:00 | |

    3. Emails

    A table to show data of email marketing.

    Table Schema
    ```markdown |Column Name |Data Type|Description | |----------------|---------|----------------------------------------------------------------------------------------| |email_id |INTEGER |Primary Key for the email table (unique identifiers for all the email sent to customers)| |customer_id |INTEGER |An ID of customer (can be duplicates) | |email_subject_id|INTEGER |An ID of Email subject (can be duplicates) | |opened |BOOLEAN |Check if email opened or not, 't'- opened, 'f'- not opened | |clicked |BOOLEAN |'t'- clicked, 'f'- not clicked | |bounced |BOOLEAN |'t'- bounced, 'f'- not bounced | |sent_date |TIMESTAMP|Date on which email was sent | |opened _date |TIMESTAMP|Date on which email was opened | |clicked_date |TIMESTAMP|Date on which customer clicked the link to land on the ZOOM ELECTRIC landing page |
    Sample Data
    ```markdown |email_id|customer_id|opened|clicked|bounced|sent_date |opened_date |clicked_date |email_subject_id| |--------|-----------|------|-------|-------|-------------------|----------------|----------------|----------------| |54 |665 |t |f |f |2011-01-03 15:00:00|04-01-2011 18:35| |6 | |55 |666 |f |f |f |2011-01-03 15:00:00| | |6 | |56 |673 |t |t |f |2011-01-03 15:00:00|04-01-2011 12:03|04-01-2011 12:06|6 | |57 |694 |f |f |f |2011-01-03 15:00:00| | |6 | |58 |701 |t |f |f |2011-01-03 15:00:00|04-01-2011 08:35| |6 |

    4. Email subject

    A table to show details of employees.

    Table Schema
    ```markdown |Column Name |Data Type|Description | |----------------|---------|----------------------------------| |email_subject_id|INTEGER |Primary Key for the email subjects| |email_subject |VARCHAR |Descriptions of emails |
    Sample Data
    ```markdown |email_subject_id|email_subject | |----------------|--------------------------------------| |1 |25% off all EVs. It's a Christmas Time| |2 |A Brand New Scooter...and Car | |3 |A New Year, And Some New EVs | |4 |An Electric Car for a New Age | |5 |Green Cars , Green Friday Offer |

    PROCESS

    Overview: I first familiarized myself with the dataset, tables and fields separately in Excel, then used MySQL to create a database using those datasets to extract the required data. Finally I returned to Excel to analyse those extracted data using pivot tables and graphs.

    Microsoft Excel

    I downloaded all the CSV files and familiarized myself with the data first.
    Then I used MySQL to create database for data extraction purpose because the files were quite large to filter and processing those files in Excel would have taken a long time. And then I came back to Excel to analyse the data, I did this 3 times for each sub-goal.
    I've streamlined the comprehension of process section by adding goto tags.

    1. Quantify the decline in sales growth.  (goto extraction)

    sprint sales graph You can see there are (-ve) values in the growth column that confirms there has been a decline in sales volume.

    2. Assess the relationship between launch date and sales growth.  (goto extraction)

    sprint vs sprint-le sales graph The Sprint scooter came out in 4th quarter whereas every other scooter models came out in the 1st or 2nd quarter.
    At the start, the Sprint scooter sold more in the first days, but later on, the Sprint LE caught up and did even better. The Sprint scooter's sales dropped quickly after 10 days, while the Sprint LE's sales went down more slowly. The Sprint scooter sold more each day on average. In the first week, the Sprint grew faster, but then slowed down. The Sprint LE's growth stayed steady. Eventually, the Sprint scooter's growth went down, while the Sprint LE kept growing.

    3. Assess the effectiveness of email campaign in influencing customers’ purchasing decisions.  (goto extraction)

    emails campaign The CTR is the percentage of people who clicked on a link in the email, while the open rate is the percentage of people who opened the email.
    The CTR is 2%, which is below the industry benchmark of 8%. This means that only 2% of the people who received the email clicked on a link in the email. The open rate is 20%, which is slightly above the industry benchmark of 18%. This means that 20% of the people who received the email opened it.

    MySQL

    I originally wanted to use Excel but the files were too big to filter and it was difficult to process for my computer. Instead I used SQL to extract the data because it could handle all of the information quicker than Excel, and I wanted to work on my SQL-querying skills. Below is my general process in SQL, I didn't include my mistakes/missteps or errors for the sake of brevity.

    View my full code on my GitHub for this project here.

    Uploaded all of the original data from the data source into MySQL database, used MySQL Workbench to import all individual CSV files and save them in separate tables and created a database.

    I extracted 3 data tables based on the sub-goals:

    1. Quantify the decline in sales growth.

    Extracted the data relevant to the business problem ie, no. of “Sprint” scooters sold in the first 3 weeks (21 days) of its production start date and named it sprint_sales with following structure.
    date – to store the sales transaction date.
    unit_sold – to calculate and store the total no. of sprint scooter sold on each date.
    Added 3 new columns:
    cumm_7D – to calculate rolling sum of 7 days period of unit_sold for consistent evaluation of sales performance throughout the week.
    cumm_7D_prev – to store the previous day’s value of cumm_7D
    growth – to calculate the growth% or change in sales of current day and previous day, a positive value shows increase in sales while a negative value shows a decline in sales, to calculate growth please refer to this formula:
    growth = 100 * (cumm_7D - cumm_7D_prev) / cumm_7D_prev
    Note: The sales_transaction_date column was of datetime date-type so I had to cast DATE() function in order to get total unit sold on each day.

    view query ```sql -- Extracting Sprint scooters sales data of first 3-weeks WITH sprint_sales AS ( SELECT ROW_NUMBER() OVER(ORDER BY sales_transaction_date) AS day, DATE(sales_transaction_date) AS date, COUNT(*) AS unit_sold FROM sales JOIN products ON sales.product_id = products.product_id WHERE products.model = 'Sprint' GROUP BY 2 LIMIT 21 ), -- Calculating cumulative sales over a rolling 7-day period cumm_sales AS ( SELECT *, SUM(unit_sold) OVER(ORDER BY day ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS cumm_7D FROM sprint_sales ) -- Calculating the rate of growth SELECT date, unit_sold, cumm_7D, cumm_7D_prev, CONCAT(ROUND((100 * (cumm_7D - cumm_7D_prev) / cumm_7D_prev), 2), '%') AS growth FROM ( -- Calculating previous value of cumulative sales to calculate rate of growth SELECT *, LAG(CASE WHEN day >= 7 THEN cumm_7D END) OVER(ORDER BY day) AS cumm_7D_prev FROM cumm_sales) AS cumm_sales_prev;
    view data ```markdown | date | unit_sold | cumm_7D | cumm_7D_prev | growth | |------------|-----------|---------|--------------|---------| | 2016-10-10 | 9 | 9 | NULL | NULL | | 2016-10-11 | 6 | 15 | NULL | NULL | | 2016-10-12 | 10 | 25 | NULL | NULL | | 2016-10-13 | 10 | 35 | NULL | NULL | | 2016-10-14 | 5 | 40 | NULL | NULL | | 2016-10-15 | 10 | 50 | NULL | NULL | | 2016-10-16 | 14 | 64 | NULL | NULL | | 2016-10-17 | 9 | 64 | 64 | 0.00% | | 2016-10-18 | 11 | 69 | 64 | 7.81% | | 2016-10-19 | 12 | 71 | 69 | 2.90% | | 2016-10-20 | 10 | 71 | 71 | 0.00% | | 2016-10-21 | 6 | 72 | 71 | 1.41% | | 2016-10-22 | 2 | 64 | 72 | -11.11% | | 2016-10-23 | 5 | 55 | 64 | -14.06% | | 2016-10-24 | 6 | 52 | 55 | -5.45% | | 2016-10-25 | 9 | 50 | 52 | -3.85% | | 2016-10-26 | 2 | 40 | 50 | -20.00% | | 2016-10-27 | 4 | 34 | 40 | -15.00% | | 2016-10-28 | 7 | 35 | 34 | 2.94% | | 2016-10-29 | 5 | 38 | 35 | 8.57% | | 2016-10-30 | 5 | 38 | 38 | 0.00% |
    (goto analysis)

    2. Assess the relationship between launch date and sales growth.

    To find the relationship between launch date and sales growth, I extracted the sales data for another scooter model named “Sprint Limited Edition” to compare with “Sprint” model. Sprint Limited Edition’s sales data table has same structure as Sprint’s sales data table.
    Then I joined both the data tables and created a new table of following structure.
    day – to store the no. of days(1-21) since the date column values in both the tables were different so it was of no use.
    sprint – to store the unit_sold column values from Sprint sales data table.
    sprint_le - to store the unit_sold column values from Sprint Limited Edition sales data table.
    sprint_cl7 - to store the cumm_7D column values from Sprint sales data table.
    sprint_le_cl7 - to store the cumm_7D column values from Sprint Limited Edition sales data table.
    growth_sprint - to store the growth column values from Sprint sales data table.
    growth_sprint_le - to store the growth column values from Sprint Limited Edition sales data table.

    view query ```sql -- Extracting Sprint scooters sales data of first 3-weeks WITH sprint_sales AS ( SELECT ROW_NUMBER() OVER(ORDER BY sales_transaction_date) AS day, DATE(sales_transaction_date) AS date, COUNT(*) AS sprint FROM sales JOIN products ON sales.product_id = products.product_id WHERE model = 'Sprint' GROUP BY 2 LIMIT 21 ), -- Extracting Sprint Limited Edition variant sales data of first 3-weeks sprintle_sales AS ( SELECT ROW_NUMBER() OVER(ORDER BY sales_transaction_date) AS day, DATE(sales_transaction_date) AS date, COUNT(*) AS sprint_le FROM sales JOIN products ON sales.product_id = products.product_id WHERE model = 'Sprint Limited Edition' GROUP BY 2 LIMIT 21 ), -- Calculating cumulative sales Of both the variants over a rolling 7-day period cumm_sales AS ( SELECT sprint_sales.day AS day, sprint, sprint_le, SUM(sprint) OVER(ORDER BY day ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS sprint_cl7, SUM(sprint_le) OVER(ORDER BY day ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS sprint_le_cl7 FROM sprint_sales JOIN sprintle_sales ON sprint_sales.day = sprintle_sales.day ) -- Calculating rate of growth of both the variants for comparision SELECT day, sprint, sprint_le, sprint_cl7, sprint_le_cl7, CONCAT(ROUND(100 * (sprint_cl7 - sprint_cl7_prev) / sprint_cl7_prev, 2), '%') AS sprint_growth, CONCAT(ROUND(100 * (sprint_le_cl7 - sprint_le_cl7_prev) / sprint_le_cl7_prev, 2), '%') AS sprint_le_growth FROM ( -- Calculating previous value of cumulative sales to calculate rate of growth SELECT *, LAG(CASE WHEN day >= 7 THEN sprint_cl7 END) OVER(ORDER BY day) AS sprint_cl7_prev, LAG(CASE WHEN day >= 7 THEN sprint_le_cl7 END) OVER(ORDER BY day) AS sprint_le_cl7_prev FROM cumm_sales ) AS cumm_sales_prev;
    view data ```markdown | day | sprint | sprint_le | sprint_cl7 | sprint_le_cl7 | sprint_growth | sprint_le_growth | |-----|--------|-----------|------------|---------------|---------------|------------------| | 1 | 9 | 6 | 9 | 6 | NULL | NULL | | 2 | 6 | 2 | 15 | 8 | NULL | NULL | | 3 | 10 | 1 | 25 | 9 | NULL | NULL | | 4 | 10 | 4 | 35 | 13 | NULL | NULL | | 5 | 5 | 5 | 40 | 18 | NULL | NULL | | 6 | 10 | 6 | 50 | 24 | NULL | NULL | | 7 | 14 | 5 | 64 | 29 | NULL | NULL | | 8 | 9 | 4 | 64 | 27 | 0.00% | -6.90% | | 9 | 11 | 6 | 69 | 31 | 7.81% | 14.81% | | 10 | 12 | 2 | 71 | 32 | 2.90% | 3.23% | | 11 | 10 | 2 | 71 | 30 | 0.00% | -6.25% | | 12 | 6 | 2 | 72 | 27 | 1.41% | -10.00% | | 13 | 2 | 4 | 64 | 25 | -11.11% | -7.41% | | 14 | 5 | 4 | 55 | 24 | -14.06% | -4.00% | | 15 | 6 | 5 | 52 | 25 | -5.45% | 4.17% | | 16 | 9 | 1 | 50 | 20 | -3.85% | -20.00% | | 17 | 2 | 3 | 40 | 21 | -20.00% | 5.00% | | 18 | 4 | 8 | 34 | 27 | -15.00% | 28.57% | | 19 | 7 | 4 | 35 | 29 | 2.94% | 7.41% | | 20 | 5 | 7 | 38 | 32 | 8.57% | 10.34% | | 21 | 5 | 7 | 38 | 35 | 0.00% | 9.38% |
    (goto analysis)

    3. Assess the effectiveness of email campaign in influencing customers’ purchasing decisions.

    Analyzing the email opening rate and click-through rates, as well as comparing them to industry benchmarks, will enable the evaluation of the effectiveness and quality of the email campaign. This will provide insights into how well the campaign engages recipients, whether they open the emails, and if they click on the provided links, helping to measure the overall impact and success of the campaign.
    I extracted the data from emails table where the sent date was between the production start date of “Sprint” model and two months before it.
    Then, I summarized those data in a different table called sprint_camp_summary with following structure:
    email_sent – to store the total count of rows sprint_campaign table.
    clicked – to store the total count of rows where clicked values were 't' in sprint_campaign table.
    opened – to store the total count of rows where opened values were 't' in sprint_campaign table.
    bounced – to store the total count of rows where bounced values were 't' in sprint_campaign table.
    I further summarized it to calculate the campaign Key Performance Indicators (KPI) to compare it with industry benchmark and named it campaign_performance_kpi with following structure:
    click_rate – to calculate and store the click through rate of emails clicked
    open_rate – to calculate and store the emails opening rate.
    To calculate the click_rate and open_rate please refer the formula below:
    click_rate = 100 * (clicked / (email_sent – bounced))
    open_rate = 100 * (opened / (email_sent – bounced))

    view query ```sql -- Extracting E-mail marketing data of Sprint scooters WITH sprint_campaign AS ( SELECT * FROM emails WHERE sent_date -- assuming the campaign started 2 months before its production start date BETWEEN (SELECT production_start_date FROM products WHERE model = 'Sprint') - INTERVAL 2 MONTH AND (SELECT production_start_date FROM products WHERE model = 'Sprint') ) -- Calculating the click through rate and email opening rate to compare with industry benchmark SELECT CONCAT(ROUND(100 * clicked / (email_sent - bounced)), '%') AS click_rate, CONCAT(ROUND(100 * opened / (email_sent - bounced)), '%') AS open_rate FROM ( -- Summarizing sprint_campaign data by counting the no. of emails- -- sent, clicked, opened & bounced for further calculation SELECT COUNT(*) AS email_sent, COUNT(CASE WHEN clicked = 't' THEN 1 END) AS clicked, COUNT(CASE WHEN opened = 't' THEN 1 END) AS opened, COUNT(CASE WHEN bounced = 't' THEN 1 END) AS bounced FROM sprint_campaign) AS campaign_summary;
    view data ```markdown | click_rate | open_rate | |------------|-----------| | 2% | 20% |
    (goto analysis)

    Misc.

    Microsoft Word

    • Resources - A list of resources I frequently used.
    • Notes - Notes for the project including the final insights, what I was looking for, and anything else having to do with the project.

    FINISHED PROJECT

    Here is my finished project: Zoom Electric: Sales Analysis.
    View my SQL queries on Github used for data extraction here and the Excel files for analysis here.

    OVERVIEW OF DATA

    Data:

    1. Sprint Sales
    2. Sprint vs Sprint Limited Edition
    3. Email Marketing Campaign

    Sprint Sales Data

    sprint sales graph The sales of the Sprint scooter started off strong, with over 100 units sold in the first few days of the month.
    The sales then declined steadily, with only a few units sold in the last few days of the month.
    The average daily sales for the month were around 50 units.
    sprint sales graph The growth rate of the Sprint scooter was positive in the first 7 days after launch, but it declined steadily after that.
    The growth rate of the Sprint scooter became negative on day 15 and remained negative for the rest of the month.
    The average daily growth rate for the month was -2%.
    Based on these insights, it is clear that there was a decline in the sales of the Sprint scooter in October 2016. The decline in sales started after the initial hype around the product died down.

    Sprint vs Sprint Limited Edition

    sprint vs sprint-le sales graph On comparing "Sprint" with "Sprint Limited Edition" sales data we see that:
    The Sprint scooter had higher sales in the first few days after launch, but the Sprint LE scooter eventually caught up and surpassed it.
    The Sprint scooter had a sharp decline in sales after day 10, while the Sprint LE scooter had a more gradual decline.
    The Sprint scooter had a higher average daily sales than the Sprint LE scooter.
    sprint vs sprint-le sales graph The Sprint scooter had a higher growth rate in the first 7 days after launch than the Sprint LE scooter.
    The Sprint scooter's growth rate then declined, while the Sprint LE scooter's growth rate remained relatively stable.
    The Sprint scooter's growth rate eventually became negative, while the Sprint LE scooter's growth rate remained positive.
    Based on these insights, it is possible that the launch date did have an impact on the sales of the two scooters. The Sprint scooter may have been more hyped up before launch, which could have led to higher initial sales and a higher growth rate.

    Email Marketing Campaign

    emails campaign The open rate for the campaign was 20%, which is slightly above the industry benchmark of 18%. However, the click-through rate was only 2%, which is below the industry benchmark of 8%. This means that only 2% of the people who opened the email clicked on a link in the email.
    Based on the insights it is clear the the email campaign was not effective in influencing customer's purchasing decision.

    INSIGHTS

    Below are the general insights:

    • The sales performance of Sprint Scooters exhibited an impressive start during the first two weeks of its launch. However, unexpectedly, there was a sudden decrease in sales by 20%.
    • The sales volume showed promise during the initial week of release, up until October 17. However, it began to decline steadily thereafter.
    • The Sprint scooter had a sharp decline in sales after day 10, while the Sprint LE scooter had a more gradual decline.
    • The Sprint scooter's growth rate eventually became negative, while the Sprint LE scooter's growth rate remained positive.
    • Only 2% of the people who opened the email clicked on a link in the email.

    WHAT I LEARNED

      Below is what I learned/practiced from this project:
    • Complex SQL queries to extract required data for analysis.
    • Use of table joins and aggregate window functions like rolling sum over a specified no. of rows, Analytical function, sub-query and common table expressions (cte).
    • Pivot tables and graphs in Excel, edited the data tables along with creating different charts using pivot tables.

    RESOURCES

    I found this project/micro-experience on bluetick.ai website.